39 research outputs found

    Database Technology for Processing Temporal Data

    Get PDF

    Leveraging range joins for the computation of overlap joins

    Full text link
    Joins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the result. Our goal is to enable built-in systems support for such joins. In particular, we present an approach where overlap joins are formulated as unions of range joins, which are more general purpose joins compared to overlap joins, i.e., are useful in their own right, and are supported well by B+-trees. The approach is sufficiently flexible that it also supports joins with additional equality predicates, as well as open, closed, and half-open time periods over discrete and continuous domains, thus offering both generality and simplicity, which is important in a system setting. We provide both a stand-alone solution that performs on par with the state-of-the-art and a DBMS embedded solution that is able to exploit standard indexing and clearly outperforms existing DBMS solutions that depend on specialized indexing techniques. We offer both analytical and empirical evaluations of the proposals. The empirical study includes comparisons with pertinent existing proposals and offers detailed insight into the performance characteristics of the proposals

    Disjoint interval partitioning

    Full text link
    In databases with time interval attributes, query processing techniques that are based on sort-merge or sort-aggregate deteriorate. This happens because for intervals no total order exists and either the start or end point is used for the sorting. Doing so leads to inefficient solutions with lots of unproductive comparisons that do not produce an output tuple. Even if just one tuple with a long interval is present in the data, the number of unproductive comparisons of sort-merge and sort-aggregate gets quadratic. In this paper we propose disjoint interval partitioning (DIP\mathcal {DIP}), a technique to efficiently perform sort-based operators on interval data. DIP\mathcal {DIP} divides an input relation into the minimum number of partitions, such that all tuples in a partition are non-overlapping. The absence of overlapping tuples guarantees efficient sort-merge computations without backtracking. With DIP\mathcal {DIP} the number of unproductive comparisons is linear in the number of partitions. In contrast to current solutions with inefficient random accesses to the active tuples, DIP\mathcal {DIP} fetches the tuples in a partition sequentially. We illustrate the generality and efficiency of DIP\mathcal {DIP} by describing and evaluating three basic database operators over interval data: join, anti-join and aggregation

    Computing the Fourier Transformation over Temporal Data Streams (Invited Talk)

    Get PDF
    In radio astronomy the sky is continuously scanned to collect frequency information about celestial objects. The inverse 2D Fourier transformation is used to generate images of the sky from the collected frequency information. We propose an algorithm that incrementally refines images by processing frequency information as it arrives in a temporal data stream. A direct implementation of the refinement with the discrete Fourier transformation requires O(N^2) complex multiplications to process an element of the stream. We propose a new algorithm that avoids recomputations and only requires O(N) complex multiplications

    Query Results over Ongoing Databases that Remain Valid as Time Passes By

    Full text link
    Ongoing time point now is used to state that a tuple is valid from the start point onward. For database systems ongoing time points have far-reaching implications since they change continuously as time passes by. State-of-the-art approaches deal with ongoing time points by instantiating them to the reference time. The instantiation yields query results that are only valid at the chosen time and get invalidated as time passes by. We propose a solution that keeps ongoing time points uninstantiated during query processing. We do so by evaluating predicates and functions at all possible reference times. This renders query results independent of a specific reference time and yields results that remain valid as time passes by. As query results, we propose ongoing relations that include a reference time attribute. The value of the reference time attribute is restricted by predicates and functions on ongoing attributes. We describe and evaluate an efficient implementation of ongoing data types and operations in PostgreSQL

    Temporal Query Processing

    Full text link

    Temporal Coalescing

    Full text link

    CORE: Nonparametric Clustering of Large Numeric Databases

    Full text link
    Current clustering techniques are able to identify arbitrarily shaped clusters in the presence of noise, but depend on carefully chosen model parameters. The choice of model parameters is difficult: it depends on the data and the clustering technique at hand, and finding good model parameters often requires time consuming human interaction. In this paper we propose CORE, a new nonparametric clustering technique that explicitly computes the local maxima of the density and represents them with cores. CORE proposes an adaptive grid and gradients to define and compute the cores of clusters. The incrementally constructed adaptive grid and the gradients make the identification of cores robust, scalable, and independent of small density fluctuations. Our experimental studies show that CORE without any carefully chosen model parameters produces better quality clustering than related techniques and is efficient for large datasets

    Efficient evaluation of ad-hoc range aggregates

    Full text link
    θ-MDA is a flexible and efficient operator for complex ad-hoc multi-dimensional aggregation queries. It separates the specification of aggregation groups for which aggregate values are computed (base table b) and the specification of aggregation tuples from which aggregate values are computed. Aggregation tuples are subsets of the detail table r and are defined by a general θ-condition. The θ-MDA requires one scan of r, during which the aggregates are incrementally updated. In this paper, we propose a two-step evaluation strategy for θ-MDA to optimize the computation of ad-hoc range aggregates by reducing them to point aggregates. The first step scans r and computes point aggregates as a partial intermediate result x̃, which can be done efficiently. The second step combines the point aggregates to the final aggregates. This transformation significantly reduces the number of incremental updates to aggregates and reduces the runtime from (∣∣r∣∣⋅∣∣b∣∣) to (∣∣r∣∣) , provided that ∣∣b∣∣<∣∣r∣∣‾‾‾√ and |x̃| ≈ |b|, which is common for OLAP. An empirical evaluation confirms the analytical results and shows the effectiveness of our optimization: range queries are evaluated with almost the same efficiency as point queries
    corecore